首页> 外文OA文献 >Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
【2h】

Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings

机译:使用声学词嵌入的无监督词分割和词典发现

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modelling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model that segments unlabelled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation.We report word error rates in a small-vocabulary connected digit recognition task by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% error rate, outperforming a previous HMM-based system by about 10% absolute. Moreover, in contrast to the baseline, our model does not require a pre-specified vocabulary size.
机译:在只有未标记语音数据可用的环境中,需要开发语音技术而无需转录,发音词典或语言建模文本。在对婴儿语言习得进行建模时,也会遇到类似的问题。在这些情况下,需要直接从语音音频中发现分类语言结构。我们提出了一种新颖的无监督贝叶斯模型,该模型可对未标记的语音进行细分,并将这些细分成假设的单词分组。结果是根据发现的单词类型对输入语音进行了完全无监督的标记化。在我们的方法中,潜在的词段(任意长度)被嵌入到固定尺寸的声学向量空间中。该模型以Gibbs采样器的形式实现,然后在该空间中构建一个完整的单词声学模型,同时进行分段。我们通过将无监督的解码输出映射到地面真实转录来报告小词汇连接数字识别任务中的单词错误率。该模型的错误率约为20%,绝对值比以前的基于HMM的系统高出约10%。此外,与基准相比,我们的模型不需要预先指定的词汇量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号